Method, system and apparatus for capture-based immersive telepresence in virtual environment

ABSTRACT

An apparatus for capture-based telepresence in a virtual environment session, comprises 3D capture devices to continuously capture image data of a first user. A processing unit comprises: an image processor for continuously producing a first 3D representation of the first user based on the image data, the first 3D representation having at least punctually a user specific position in a virtual environment. A data transmitter transmits the first 3D representation for remote use in the virtual environment session. A data receiver continuously receives a second 3D representation of a second user, image data for the virtual environment, and at least punctually receiving user specific position of the second 3D representation relative to the virtual environment. A rendering engine outputs for display the 3D representation of the second user positioned relative to the first user as inserted in the virtual environment based on said user specific positions. A method for participating in a virtual environment session is also provided.

CROSS-REFERENCE TO RELATED APPLICATION

The present application claims priority on U.S. Provisional PatentApplication Ser. No. 61/878,796, filed on Sep. 17, 2013.

TECHNICAL FIELD

The present application relates to virtual reality with telepresencecapability.

BACKGROUND OF THE ART

Due to the mass availability of networked computers with ever-increasingprocessing power and of peripherals such as webcams and microphones,imaged telecommunications have evolved and are commonly available andused on a global scale, for instance, as Skype™, Facetime™ and the like.These technologies operate real time photorealistic telecommunicationsby which participants and their background environments aretelecommunicated in real time, for participants remote from each otherto communicate visually. Such systems have limitations, notably in the2D (two-dimensional) rendering of images and the stagnant background ofthe participants.

Other communication and collaboration platforms, such as Second Life™ orWorld of Warcraft™, propose shared on-line virtual 3D environments(persistent worlds) where participants, appearing as avatars, are ableto move around, operate, transform the space, meet and collaborate withone another. The avatars are virtual objects and hence do not allownon-verbal communication clues such as eye contact, posturing andspatial positioning, which clues could enhance inter-personalcommunication and group collaboration. Another limitation of all abovesystem is the use of standard flat screen monitors in such systems,which limits the immersion in spite of the virtual 3D environments.

SUMMARY

It is therefore an object of the present disclosure to provide a method,system and apparatus for a capture based representation of participantsin an immersive telepresence virtual environment session.

Therefore, in accordance with an embodiment of the present disclosure,there is provided an apparatus for capture-based telepresence in avirtual environment session, comprising: at least one three-dimensional(3D) capture device to continuously capture image data of a first user;a processing unit comprising at least: an image processor forcontinuously producing a first 3D representation of the first user basedon the image data, the first 3D representation having at leastpunctually a user specific position in a virtual environment; a datatransmitter for transmitting the first 3D representation for at least aremote use in the virtual environment session; a data receiver forcontinuously receiving a second 3D representation of a second user,image data for the virtual environment, and at least punctuallyreceiving user specific position of the second 3D representationrelative to the virtual environment; and a rendering engine foroutputting at least the 3D representation of the second user positionedrelative to the first user as inserted in the virtual environment basedon said user specific positions; at least one display device fordisplaying the output.

Further in accordance with the embodiment of the present disclosure, theat least one display device comprises an immersive image screen deployedin up to 360° around the participant.

Still further in accordance with the embodiment of the presentdisclosure, the at least one display device is one of a hemisphericalscreen, a frusto-spherical screen, a cylindrical screen, a set offlatscreens, tablet and head mounted display with orientation tracking,with the first user physically located in a central portion of said atleast one display device.

Still further in accordance with the embodiment of the presentdisclosure, three of the 3D capture device are provided, with the firstuser physically located in a central portion relative to the 3D capturedevices surrounding the first user.

Still further in accordance with the embodiment of the presentdisclosure, an audio capture device continuously captures audio dataemitted at least by the first user and for transmission via the datatransmitter for at least a remote use in the virtual environmentsession; an audio driver for producing audio content of the second userreceived by the data receiver; and speakers for outputting the audiocontent.

Still further in accordance with the embodiment of the presentdisclosure, the audio capture device, and the speakers are part of amulti-channel audio system deployed in up to 360° around theparticipant.

Still further in accordance with the embodiment of the presentdisclosure, a command processor receives commands from the first userand for transmission via the data transmitter for interfacing with thevirtual environment session.

Still further in accordance with the embodiment of the presentdisclosure, an interface communicates with the command processor.

Still further in accordance with the embodiment of the presentdisclosure, the first 3D representation has at least punctually a userspecific position and orientation in the virtual environment, andfurther wherein the data receiver receives at least punctually a userspecific position and orientation of the second user, the renderingengine outputting the 3D representation of the second user orientedrelative to the virtual environment.

Still further in accordance with the embodiment of the presentdisclosure, the rendering engine outputs a reference orientationlandmark for calibration of the first user's orientation in the virtualenvironment, and wherein the user specific orientation of the first useris based on the reference orientation landmark.

Still further in accordance with the embodiment of the presentdisclosure, the rendering engine outputs a mirror image of the processed3D representation of the first user as the reference orientationlandmark, the mirror image being orientable to set the user specificorientation of the first user.

Still further in accordance with the embodiment of the presentdisclosure, the at least one 3D capture device continuously captures a3D point cloud of the first user.

Still further in accordance with the embodiment of the presentdisclosure, the at least one 3D capture device continuously captures the3D point cloud of the first user with chromatic and luminance data.

Still further in accordance with the embodiment of the presentdisclosure, there is provided a system for operating a virtualenvironment session, comprising: at least two of the apparatus describedabove; and a virtual environment server comprising: a virtualenvironment manager for providing the image data for the virtualenvironment and for managing a coordinate system of a virtualenvironment based on at least on the user specific position andorientation of the users in the virtual environment.

Still further in accordance with the embodiment of the presentdisclosure, the virtual environment server further comprises an assetmanager for recording and storing modifications to the virtualenvironment.

Still further in accordance with the embodiment of the presentdisclosure, the virtual environment server further comprises a controlunit for administrating virtual environment sessions, the administratingcomprising at least one of login, troubleshooting, technical support,calibrating, events synchronisation, stream monitoring, accessprivileges/priority management.

In accordance with another embodiment of the present disclosure, thereis provided a method for participating in a virtual environment sessioncomprising continuously receiving image data of a first user;continuously producing a first 3D representation of the first user basedon the image data, the first 3D representation having at leastpunctually a user specific position and orientation in a virtualenvironment; transmitting the first 3D representation for at least aremote use in the virtual environment session; continuously receiving asecond 3D representation of a second user, image data for the virtualenvironment, and at least punctually receiving user specific position ofthe second 3D representation relative to the virtual environment; andoutputting for display at least the 3D representation of the second userpositioned relative to the first user as inserted in the virtualenvironment based on said user specific positions.

Further in accordance with the other embodiment of the presentdisclosure, there is provided continuously receiving audio data emittedat least by the first user and transmitting the audio data for at leasta remote use in the virtual reality session.

Still further in accordance with the other embodiment of the presentdisclosure, there is provided continuously receiving and producing audiocontent of the second user received by the data receiver.

Still further in accordance with the other embodiment of the presentdisclosure, there is provided receiving commands from the first user andtransmitting the commands for interfacing with the virtual environmentsession.

Still further in accordance with the other embodiment of the presentdisclosure, there is provided receiving at least punctually a userspecific orientation of the second user, and outputting for display the3D representation of the second user oriented relative to the virtualenvironment based on the user specific orientation of the second user.

Still further in accordance with the other embodiment of the presentdisclosure, there is provided outputting a reference orientationlandmark for calibration of the first user's orientation in the virtualenvironment, the user specific orientation of the first user based onthe reference orientation landmark.

Still further in accordance with the other embodiment of the presentdisclosure, outputting the reference orientation landmark comprisesoutputting a mirror image of the 3D representation of the first user asthe reference orientation landmark, the mirror image being orientable toset the user specific orientation of the first user.

Still further in accordance with the other embodiment of the presentdisclosure, continuously receiving image data comprises continuouslyreceiving a 3D point cloud of the first user.

Still further in accordance with the other embodiment of the presentdisclosure, continuously receiving a 3D point cloud of the first usercomprises receiving chromatic data for points of the 3D point cloud.

Still further in accordance with the other embodiment of the presentdisclosure, continuously producing a first 3D representation of thefirst user comprises downsampling the 3D point cloud by removingredundant points.

Still further in accordance with the other embodiment of the presentdisclosure, continuously producing a first 3D representation of thefirst user comprises filtering the 3D point cloud by removing outlierpoint.

Still further in accordance with the other embodiment of the presentdisclosure, continuously producing a first 3D representation of thefirst user comprises creating a 3D mesh using the 3D point cloud.

Still further in accordance with the other embodiment of the presentdisclosure, continuously producing a first 3D representation of thefirst user further comprises projecting chromatic data of the 3D pointcloud on the 3D mesh.

DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a system for hosting a virtual realitysession with photorealistic telepresence capability;

FIG. 2 is a block diagram of a station of the system of FIG. 1;

FIG. 3 is a schematic view illustrating an embodiment of the station ofFIG. 2, involving flat screens;

FIG. 4 is a schematic view showing an embodiment of the station of FIG.2, with a hemispherical screen;

FIG. 5 is a block diagram of a virtual reality server of the system ofFIG. 1;

FIG. 6 are schematic elevation and plan views illustrating an embodimentof the station of FIG. 2, involving a cylindrical screen;

FIG. 7 are schematic elevation and plan views illustrating an embodimentof the station of FIG. 2, involving a handheld screen;

FIG. 8 are schematic elevation and plan views illustrating an embodimentof the station of FIG. 2, involving a headmounted display;

FIG. 9 are schematic elevation and plan views illustrating an embodimentof the station of FIG. 2, involving a hemispherical screen;

FIG. 10 are schematic elevation and plan views illustrating anembodiment of the station of FIG. 2, involving a spherical screen;

FIG. 11 is a schematic view of the system of FIG. 1 for hosting avirtual reality session with photorealistic telepresence capability;

FIG. 12 is a schematic plan view of the system of FIG. 4, with a userperforming an orientation calibration; and

FIG. 13 is a flowchart illustrating a method for participating in avirtual environment session in accordance with the present disclosure.

DETAILED DESCRIPTION OF THE EMBODIMENTS

Referring to the drawings and more particularly to FIG. 1, there isillustrated at 10 a system for hosting a virtual reality session withphotorealistic (capture based) telepresence capability for participants,where participants are incorporated (as 2D or 3D entities) in a sharedvirtual environment. The expression “photorealistic” may be defined asincorporating realistic morphological cues of a participant's anatomy asobtained by capturing devices, the morphological cues including the eyecontact, facial expressions, body language, positioning and posture. Theresolution may vary, but the photorealism of the system 10 is sufficientto recognize some of the morphological cues mentioned above. In anembodiment, the participants are not simply overlaid on a flat 360°panorama, but incorporated in a 3D virtual world where they may movearound while the scene is constantly being redrawn to match their pointon view. For simplicity purposes, reference will be made hereinafter tothe system 10. The system 10 hosts a virtual environment session A. Thevirtual environment session A may be one in which an immersive360-degree virtual environment featuring the entire or a substantialpart of the horizon is displayed, with participants being virtuallypresent in the session A as photorealistic 2D or 3D (three-dimensional)representations of themselves, in real time or quasi-real time.Moreover, the virtual environment session A may allow the participantsto communicate audibly with one another, although the audiblecommunication is optional. In the virtual environment session A,spatialization of objects, including the participants, is respected,with each object, including the participants, having a specific locationwithin the virtual environment, with dynamic modifications beingrepresented in real time. In an embodiment, the participants arerepresented in 3D photorealistic representations, and this allows anexpression of the participants' body language by the representation oforientation, position and movements. The virtual environment sessions Amay provide a permanent or transient virtual environment for theparticipants to meet in a context where non-verbal communication cues,such as eye contact and posturing, are reproduced.

The shared virtual environment may be anything from a blank space to acontent rich and/or generative metaworld whose characteristics will bethe subject of the encounters or, more simply, serve as contexts forencounters, conversations, collaborative projects, etc. These sharedenvironments may be capture-based (photorealistic) environment (e.g.,captured images and photogrammetry or video of a real location), asynthetic environment, or a combination thereof.

The characteristics of these virtual environments may support uses suchas social network environments, virtual meeting spaces,informative/educational spaces, training and simulation environments,gaming environments, brainstorming sandboxes (drawing spaces), realworld monitoring environments (control rooms), real-time datavisualisation and analysis environments, complex system modelizationenvironments, etc. . . . .

The control offered to each participant may include mirror functions tocalibrate the participant's representation, navigation functions to leta participant move within the space. Other toolbox like functions suchas drawing, form generation and placement, audio, visual or textualrecording, may let the participant interact with and transform theenvironment.

Parameters access for the control of privacy (visibility), access(public/private spaces, home settings), editing (personal contentmanagement), tools and shortcuts may also be included.

In FIG. 1, the virtual environment session A (i.e., virtual environment)is displayed at stations 12, also known as the apparatuses 12. Threedifferent stations 12 are illustrated in FIG. 1, but the virtualenvironment session A may involve a single one of the stations 12 ormore than the three stations 12 shown in FIG. 1. As thetelecommunication between stations 12 may be network-based orcloud-based, other stations 12 may join in, or several different virtualreality sessions may take place simultaneously. The stations 12 arediscreet locations in which a participant is captured visually to thenbe represented in the virtual environment session A. The stations 12 arelocations at which participants enter to become immersed within theshared virtual environment session A. The stations 12 will be describedin further detail hereinafter.

Still referring to FIGS. 1 and 11, a virtual environment server 14operates the virtual environment session A, notably by providing andupdating a shared virtual environment, including a position and/ororientation of the participants. The server 14 may also be used tomanage and control the virtual environment session A. In an embodiment,the processing performed by the virtual environment servers 14 may beperformed instead by the stations 12. However, the configuration of FIG.1 is well suited to reduce the processing performed by the stations 12,by having one or more servers acting as the virtual environmentserver(s) 14, with a cloud-based approach that may be more robust, asshown in FIG. 11. The system 10 as shown in FIG. 1 may have additionalcomponents.

Referring to FIG. 2, there is illustrated one of the stations 12 ingreater detail. The station 12 comprises the hardware forming a physicalenvironment to accommodate a participant, and the hardware and softwareto locally host the virtual environment session A. In other words, thestation 12 is responsible for capturing images, sound and/or controldata of the local participant, to transmit this information to thevirtual environment session A. Likewise, the station 12 is responsiblefor outputting visually and audibly the constantly updated virtualenvironment session A for the participant locally occupying the station12.

Accordingly, the station 12 comprises a processing unit 20. Theprocessing unit 20 is typically a computer (e.g., personal computer,such as a desktop or laptop) that is connected to a plurality ofperipherals to perform the functions stated above. The processing unit20 incorporates a plurality of software modules that will enable thestation 12 to locally host and participate in the virtual environmentsession A. The processing unit 20 thus has sufficient processing speedin order to host the virtual environment session A, considering that thecomputer processing may apply only a subset of the whole session.

Referring concurrently to FIGS. 2 and 3, the station 12 has capturedevices 21, such as 3D capture devices, for capturing image data tosubsequently produce a photorealistic 3D representation of theparticipant(s). Although the station 12 may operate with a single one ofthese devices, an embodiment has at least three of these 3D capturedevices 21 in order to produce the photorealistic 3D representation ofthe participant in the station 12. For instance, the 3D capture devices21 are at least three point-cloud capture units, such as the Kinect™ orPrimeSense™, that may be equidistantly disposed horizontally around theparticipant, disposed to capture the entirety of the participant orgroup of participants. These 3D capture devices 21 concurrently producea cloud of points that is subsequently used to model the participant,the point cloud being a set of points described by position in space (3components˜{x,y,z}). The cloud of points may be linked to RGB textureand depth map. The cloud of points may contain enough data for the modelto produce a photorealistic 3D representation of the participant; thepoint cloud data can also be used to create a mesh onto which a videotexture the RGB textures obtained from the same 3D capture devices 21are applied, as described hereinafter. The resulting model may also havea node to orient and position the participant within the virtualenvironment of the session A. Suitable lighting may also be provided toensure that the image capture is of quality.

As another example, technologies such as time-of-flight tracking orphotogrammetry may be used to capture sufficient image data that may beused to produce a 3D representation of the participant as describedhereinafter.

An audio capture device 22 is also provided in the station 12. The audiocapture device 22 may take various forms and is used to record soundsproduced by the participant of the station 12 and his/her environment.For instance, one or more microphones may be used in any appropriateconfiguration.

An interface 23, such as a handheld device (e.g., remote control,smartphone, etc.), may be provided for the participant to performcommands related to the virtual environment session A. Among variouspossibilities, the interface 23 may be used by the participant to modifyhis/her position within the virtual environment, i.e., to move withinthe environment. The interface 23 may also be used for the participantto draw and/or displace objects of the virtual environment, to operatevarious actions, to operate the processor unit 20, etc. Hence, if theinterface 23 is a handheld device, appropriate command buttons, slidersand the likes, grouped into a variety of screens, are provided thereon(e.g., an application on a smartphone to control the virtualenvironment). In an embodiment, the interface 23 may be the participanthimself/herself, with a part of the body of the participant thatprojects eccentrically from a remainder of the body being recognized asa desired interfacing action (gesture control). For instance, anextended arm may be regarded as an interfacing action and hence beidentified as such by the processing unit 20. The interface 23 mayprovide a mirror function to let a participant adjust his/herrepresentation in order to align its position in relation to the worldand thus calibrate eye contact and coherent body language such aspositioning and pointing in relation to the shared environment. Otherfunctions of the interface may include menus for selecting, joining andleaving virtual reality sessions, or for setting parameters allowingcontrol of privacy (visibility), access (public/private spaces, homesetting), editing (personal content management), tools and shortcuts.

Still referring to FIGS. 2 and 3, a display device 24 is provided todisplay the virtual environment with the photorealistic 3D imagesrepresentations of remote participants, within the virtual environment.As shown in FIG. 3, the display device 24 may be a plurality of flatscreens 24A arranged to surround the participant by 360 degrees, withthe participant being immersively located in the center of the displaydevice(s) 24. This represents a cost-effective solution by whichcomputer monitors or digital television sets are used. FIG. 4 shows thatthe display device 24 may be a hemispherical screen used withappropriate projectors, as an exemplary embodiment. The hemisphericalscreen is shown as 24B while the projector is shown as 24C. Forinstance, the hemispherical screen 24B and projector 24C are asdescribed in the U.S. Pat. No. 6,905,218, by Courchesne (also shown inFIG. 9). Other configurations include a cylindrical screen (FIG. 6),spherical displays and domes (FIG. 10), cycloramas, handheld screen(FIG. 7), and head-mounted displays (FIG. 8) with orientation trackingand inertial sensors, etc.

Speakers 25 may also be a peripheral of the station 12. The speakers 25may be that of the television sets in the example of FIG. 3 or mayalternatively be simple stand-alone speakers, for example. It is desiredto have multiple sound channels feeding multiple speakers 25 to providesome form of stereo effect or spatialized sound related to the locationof the various participants in the virtual environment. Indeed, therelative relation between participants in the virtual environmentsession A can be rendered in terms of sound. For instance, the amplitudeof the sound emitted by the speakers 25 may be adjusted as a function ofthe position of the sound source (i.e., the participant) within thevirtual environment session A. The speakers may be replaced byheadphones, for instance equipped with orientation tracking to simulatethe virtual 3D sound environment.

Referring to FIG. 2, the processor unit 20 has a plurality of modules totreat the captured data and transmit it to the virtual environmentsession A. Likewise, the processing unit 20 has modules by which thedata received from the virtual environment session, i.e., the virtualenvironment servers 14, is output in the form of images and sound, toreproduce the virtual environment session A.

The image processor 20A, also referred to as a point cloud processor,receives the raw or processed image data obtained by the capture devices21. The image data may be a cloud of points with chromatic and luminanceinformation, or may be raw images that will be processed by the imageprocessor 20A to be converted into a 3D representation of the user inthe form of a point cloud. The image processor 20A may perform some formof filtering, cropping, merging and modelling (e.g., image data fusionor point cloud imaging, meshing, texturing). In an example, the imagedata is a cloud of points, and the image processor 20A receives thispoint cloud output from multiple 3D capture devices 21, for instancefrom the three shown in FIG. 3, whereby some of the points will beredundant. The image processor 20A may process this raw data to producea smoothened model of the participant. The image processor 20A may thuscreate a photorealistic model, or may produce different effects, such asa cartoon-like representation of the participant. As mentioned above,one possibility is that the image data captured is typically in the formof a stream of a point cloud in an x, y and z coordinate system, withthe points comprising additional data such as colour (RGB), etc, asrequired as a function of the desired quality of the model. In an effortto reduce bandwidth use, these point cloud (data sets) can betransformed into mesh onto which an image texture obtained from thecapture device 21 may be applied for adequate image resolution andsmaller data sets. The image processor 20A and the capture devices 21operate concurrently to continuously update the model, such that themodel is a real time representation of the participant and his/herposition within the shared environment. Continuously or relatedexpressions are used hereinafter to describe an action repeated at agiven frequency over an amount of time, i.e., during the virtualenvironment session. While a continuous action may be interrupted duringthe virtual environment session, it will nonetheless be maintained overa given frequency.

According to an embodiment, the point cloud processor 20A receivesupdated point clouds from each 3D capture device 21. Each point cloudgoes through a first pass of clipping, to keep only the region ofinterest: the participant and his/her surroundings. The surroundings(featuring the virtual environment in which the participant is immersed)may be kept to enable a correct and precise calibration of the multiplepoint clouds. A second pass may then be applied, in which thegeometrical transformation is executed from the calibration on the pointclouds. An optional auto-calibration may be used to enhance the inputcalibration. A third pass may be performed to crop the resulting pointcloud to keep only the point cloud of the participant. Finally, thispoint cloud is sent to the data transmitter 20D either as is, orcompressed through a reduction of the point cloud density (points closeto each other are merged relatively to a distance parameter) and/orthrough a lossless compression algorithm. An additional transformationmay be applied to the point cloud dataset to create a mesh onto which avideo an image texture is applied. This texture may be captured live bythe same point cloud/RGB capture devices or pre-recorded and used aswearable skins.

Considering that the station 12 must share image data with a virtualenvironment server 14, it may be desirable to reduce as much as possiblethe bandwidth necessary while maintaining a photorealisticrepresentation of the participant. In an embodiment, additionalprocessing is performed on the cloud of points to reduce the size of thedata set sent to a remainder of the system 10, while maintainingadequate photorealism. The cloud of points produced by the 3D capturedevices 21 is a noisy point cloud. The point cloud produced by the 3Dcapture devices 21 is downsampled (i.e., reducing the sample density ofthe 3D point cloud) and filtered to get rid of some of the noise. Forexample, as mentioned above in the case in which the image data is a 3Dcloud point, there are some redundant points and the downsampling andfiltering allows the reduction of the number of points, and hencecomputation time is reduced. Different techniques may be used for thefiltering, including using existing techniques like VoxelGrid filter.Subsequently, the downsampled point cloud is used to generate a coarsemesh, i.e., set of vertices, edges and faces that define the shape of ageometric model. For example, the Marching Cubes method may be used togenerate the mesh, the Marching Cube method being an algorithm dedicatedto the extraction of a polygonal mesh from three-dimensional scalarfield. This technique may be combined with average position approach tocreate a smooth surface, to reduce the discontinuities present onsurface. This smoothing step lessens the square feeling resulting fromthe use of Marching Cubes.

A refining step is then performed to combine the coarse mesh to thepoint cloud. As the mesh was created with the downsampled point cloud,this refining uses the original point cloud to update minute detailslost in the coarse meshing, by essentially projecting the chromatic dataonto the coarse mesh. By way of an example, signed distance field (SDF)is used to update the coarse mesh with the input point cloud. Thedirection of the displacement applied is based on the normal vector ofthe nearby vertex. Before updating the mesh, polygons having higherdensity of points are subdivided, the polygons being surfaces bound bystraight line segments. The polygons are subdivided using the inputpoint cloud. As a result, the final mesh will increase its faithfulnessto the fine details.

These steps are performed by the image processor 20A such that thesignal sent to the data transmitter 20D is substantially reduced interms of bandwidth. It is also considered to add a mesh decimation(i.e., a reduction of the vertex density where there is less details) tokeep details only where needed, to reduce the size of the mesh. Thetexture is then obtained by projecting the images (RGB) from the capturedevices 21 onto the mesh. The blending between the multiple images canbe improved by matching detected features.

The audio processor 20B will receive the sound from the audio capturedevice 22. The audio processor 20B may also attribute positional data tothe sound, for instance, if multiple audio capture devices 22 areprovided. Moreover, the audio processor 20B may perform some filteringto remove noise.

A command processor 20C receives the commands from the interface 23,whether the interface 23 is a discrete device or movements performed bythe participant. In the latter embodiment, the command processor 20Cwill recognize such movements and interpret them as desired interfaceactions from the participant.

The image stream provided by the image processor 20A and the audiostream provided by the audio processor 20B must be synchronized, toavoid or reduce any lag between image and sound, in the virtual realitysession A. In an embodiment, the real time processing by modules 20A and20B results in the image stream and the audio stream being synchronized.The data from the modules 20A, 20B and possibly 20C is provided to thedata transmitter 20D who will telecommunicate the data to the virtualreality session A, i.e., via the other participating stations 12 and thevirtual reality server 14. According to an embodiment, the datatransmitter 20D produces a composite signal for instance along multiplechannels, incorporating the various streams of image and audio data,with synchronization therebetween. The data transmitter 20D may bewired, wireless, with any adequate components, such as encoders,compressors, etc. Any adequate protocol may be used by the datatransmitter module 20D. For simplicity, the data transmitter module 20Duses the internet to communicate with the stations 12 and/or servers 14,and the composite signal is compressed.

The data transmitter 20D receives inputs from the point cloud, audio andcommand processors. The multiple inputs can be sent independently, orpacked together to ensure synchronization whatever the characteristicsof the network. Also, every source/packet can be sent to a multiplicityof receivers, allowing the data to be used locally as well as on anygiven number of distant stations.

Still referring to FIG. 2, a data receiver 20E receives data from thevirtual environment sessions A. For instance, the data receiver 20Ereceives virtual environment data stream by which the virtualenvironment will be reproduced on the display devices 24 of the station12. Moreover, if other participants are in other stations 12 in thearrangement of FIG. 1, the data receiver 20E will receive image andaudio data streams therefrom for the telerepresentation of theparticipants in the virtual environment. The data receiver 20E receivesand unpacks the packets sent by the distant stations, and makes themavailable for use by the rendering engine and the audio driver, or anyother local unit.

A rendering engine 20F is connected to the data receiver 20E and is usedto produce on the display device(s) 24 the virtual environmentincorporating the telerepresentations of the various participants. It ispointed out that the image processor 20A may be connected to therendering engine 20F directly, such that a participant may see his/herown telerepresentation in the virtual environment as produced by his/herown station 12. The rendering engine 20F may have 3D renderingcapability to produce 3D images of the participants, and may performsome image treatment (e.g., color adjustment, contrast adjustment,rendering). In an embodiment, the image treatment of models of remoteparticipants is commanded by the local participant via the interface 23.

An audio driver 20G is connected to the speakers 25. The audio driver20G receives the audio data stream from the data receiver 20E. In anembodiment, the audio data stream received by the audio driver 20G isidentified with channel data, so as to produce a depth effect for thesound that is output by the speakers 25. For instance, there may be achannel per speaker, and the audio driver 20G is configured to outputsound in the appropriate channels, as a function of the spatial locationof the sound source with the virtual environment.

In similar fashion to the image processor 20A and rendering engine 20F,the audio processor 20B and the audio driver 20G may communicatedirectly within the processing unit 20. The command processor 20C mayalso be in direct contact with the rendering engine 20F and the audiodriver 20G, for instance to allow control functions to be performed bythe local participant on the image and sound.

Referring to FIG. 5, there is illustrated an embodiment of the virtualenvironment server 14. According to this embodiment, the virtualenvironment server 14 has different modules, different servers, or anyother appropriate configuration. For simplicity, reference is made to asingle server 14 although other configurations are contemplated.

The virtual environment server 14 has a virtual environment manager 40.The virtual environment manager 40 provides the data stream representingthe shared virtual environment (i.e., the background or context). Thisshared virtual environment may be based locally with only the changes ofits configuration being transmitted on the network to other stations.The data stream may thus comprise image and/or sound data related to thevirtual environment. For instance, if the virtual environment is arepresentation of a real location, the data stream may be an actualvideo and audio stream of the real location. According to an embodiment,the data stream is a live or quasi-live feed of a real location, withthe virtual environment manager 40 performing some signal processing totelecommunicate the feed in appropriate format.

The virtual environment session A may have spatialization managementcapability. Hence, the virtual environment manager 40 manages a 3-axiscoordinate system of the shared virtual environment (including a“north”—i.e., a reference orientation), and thus monitors and stores theposition and orientation of the participant representations relative tothe virtual environment. In an embodiment, each discrete item in thevirtual environment session A has a node (or family of nodes),representing the position of the discrete item in the coordinate system.In addition to a node (or family of nodes), each discrete item may havean orientation, relative to a reference orientation of the 3-axiscoordinate system of the virtual environment. Any movement of object(i.e., change in node coordinates) is monitored and stored, and sharedwith the stations 12 to update the images of the virtual reality sessionA. Likewise, the 3D representations may be provided with a user specificposition and/or orientation when received from the stations, such thatthe user may control his/her location in the virtual environment. Whilethe 3D representations are continuously updated, the user specificposition and/or orientation may be provided occasionally (or punctually)as part of the 3D representation stream, for instance at calibrationand/or when there is a variation in the user's position and/ororientation in the virtual environment.

A virtual environment control unit 41 may also be provided. The virtualenvironment control unit 41 is typically used by the operator of thevirtual environment session A. The virtual environment control 41therefore allows management of the virtual environment session A forinstance in terms of login, troubleshooting, technical support, and tosupport the use of the system 10 by independent stations 12. Operationsperformed by the control unit 41 may include as well calibrating, streammonitoring, access privileges/priority management, etc.

Finally, the asset manager 42 may be equipped with a database that willkeep updated data of the various virtual environment sessions A, whensuch sessions are permanent or semi-permanent (persistent worlds). Forinstance, if the virtual environment session A is modified ortransformed by the participants or by internal generative algorithms,the asset manager 42 may keep such a record of the modifications, torestore the virtual environment session A, for future uses. In anembodiment, the virtual environment may be created by the participantsor by internal generative algorithms and thus stored in the assetmanager 42. The virtual environment manager 40 comprises numerousmodules similar to that of the stations (e.g., data transmitter module,data receiver module), which are not shown in the figures, forsimplicity.

While the above description involves a client-server network with thevirtual reality server 14, it is contemplated to operate virtualenvironment sessions in a peer-to-peer arrangement between stations 12.In such cases, the various functions and modules of the server 14 wouldbe supplied by the stations 12.

Referring to FIG. 12, there is illustrated a mirror function occurringat the stations 12. The mirror function is one of a few techniquesconsidered to calibrate a participant's orientation in the virtualenvironment. Indeed, it is desirable to share a reference orientation,i.e., a common North, such that the physical orientation of the user inhis/her local station 12 is converted into a virtual orientation of theuser in the virtual environment, which virtual orientation is common toall participants in the virtual environment sessions. Hence, thespatialization managed by the server 14 may include the referenceorientation in addition to the x, y and z position. By calibrating theorientation of the participants relative to the reference orientation,it is possible for a participant to orient himself/herself to be face toface with another participant. Likewise, by such a calibration, aparticipant may point to an item and the other participants will becapable of observing what the first participant is pointing at.

In FIG. 12, a mirror image 50 of the participant is produced locally,using the image data captured by the capture devices 21. The imageprocessor 20A and the rendering engine 20F may thus output this virtualmirror image of the participant. The participant is then required toadjust an orientation of his/her mirror image until a predefinedorientation is reached: for example, when the his own mirror image isface to face with the participant. When the predefined orientation isreached, the command processor 20C receives the confirmation from theparticipant, whether it be using the interface 23, or by performingrecognizable movements. The orientation of the mirror image is then sentto the server 14 as part of the 3D representation, and the server 14 maycalibrate the participant's orientation relative to the referenceorientation using this data.

Other methods are also possible to calibrate the orientation, forinstance by projecting a reference orientation landmark (e.g., avertical line) in the virtual environment for local display. Theparticipant is then requested to orient himself/herself to face thereference orientation landmark—upon a standstill of a few seconds, theimage data is recorded relative to the reference orientation. Yetanother method considered to calibrate the orientation is to performsome form of morphological recognition of the participant's anatomy. Forexample, the image processor 20A may be programmed to recognize theparticipant's eyes and then relate the orientation of the participantrelative to the reference orientation based on this recognition.

According to another embodiment, the 3D representation communicated bythe data transmitter 20D is a point cloud without chromatic and/orvolumetric data, i.e., a skeleton. In such a case, the other localstation 12 or the server 14 may provide chromatic and/or volumetric datato be projected and/or attached onto the skeleton. For example, otherlocal stations may perform this chromatic and/or volumetric dataprojection based on pre-registered images/models of the participants.This arrangement may be suitable when a set of participants frequentlyparticipate in virtual reality sessions together in order to reducebandwidth requirements.

Referring to FIG. 13, a method for participating in a virtualenvironment session is generally shown at 60. The method 60 may forinstance be performed by the processor unit 20 of any one of thestations 12. In method 60:

61 comprises continuously receiving image data of a first user; 61 mayalso include continuously receiving audio data emitted at least by thefirst user and transmitting the audio data for at least a remote use inthe virtual reality session, and continuously receiving and producingaudio content of the second user received by the data receiver. Themethod 60 may also include receiving commands from the first user andtransmitting the commands for interfacing with the virtual environmentsession, receiving at least punctually a user specific orientation ofthe second user, and outputting for display the 3D representation of thesecond user oriented relative to the virtual environment based on theuser specific orientation of the second user, outputting a referenceorientation landmark for calibration of the first user's orientation inthe virtual environment, the user specific orientation of the first userbased on the reference orientation landmark, and outputting a mirrorimage of the 3D representation of the first user as the referenceorientation landmark, the mirror image being orientable to set the userspecific orientation of the first user. 61 may also include continuouslyreceiving image data comprises continuously receiving a 3D of the firstuser, and continuously receiving a 3D point cloud of the first usercomprises receiving chromatic data for points of the point cloud.

62 comprises continuously producing a first 3D representation of thefirst user based on the image data, the first 3D representation havingat least punctually a user specific position in a virtual environment;62 may also include continuously producing a first 3D representation ofthe first user comprises downsampling the 3D point cloud by removingredundant points, filtering the 3D point cloud by removing outlierpoint, creating a 3D mesh using the 3D point cloud, and projectingchromatic data of the 3D point cloud on the 3D mesh

63 comprises transmitting the first 3D representation for at least aremote use in the virtual environment session;

64 comprises continuously receiving a second 3D representation of asecond user, image data for the virtual environment, and at leastpunctually receiving user specific position of the second 3Drepresentation relative to the virtual environment;

65 comprises outputting for display at least the 3D representation ofthe second user positioned relative to the first user as inserted in thevirtual environment based on said user specific positions.

The invention claimed is:
 1. An apparatus for capture-based telepresencein a virtual environment session, comprising: at least onethree-dimensional (3D) capture device to continuously capture image dataof a first user; a processing unit comprising at least: an imageprocessor for continuously producing a first 3D representation of thefirst user based on the image data and for calibrating a user specificorientation as a function of a reference orientation a virtualenvironment, the calibrating including determining the user specificorientation from a morphology of the first user's anatomy, the first 3Drepresentation having at least punctually a user specific position andsaid user specific orientation in a virtual environment; a datatransmitter for transmitting the first 3D representation for at least aremote use in the virtual environment session; a data receiver forcontinuously receiving a second 3D representation of a second user,image data for the virtual environment, and at least punctuallyreceiving user specific position and user specific orientation of thesecond 3D representation relative to the virtual environment, the userspecific orientation of the second user being calibrated so as to be afunction of the reference orientation of the virtual environment; and arendering engine for outputting at least the 3D representation of thesecond user positioned relative to the first user as inserted in thevirtual environment based on said user specific positions and on saiduser specific orientations; at least one display device for displayingthe output.
 2. The apparatus according to claim 1, wherein the at leastone display device comprising an immersive image screen deployed in upto 360° around the participant.
 3. The apparatus according to claim 2,wherein the at least one display device is one of a hemisphericalscreen, a frusto-spherical screen, a cylindrical screen, a set offlatscreens, tablet and head mounted display with orientation tracking,with the first user physically located in a central portion of said atleast one display device.
 4. The apparatus according to claim 1,comprising three of the 3D capture device, with the first userphysically located in a central portion relative to the 3D capturedevices surrounding the first user.
 5. The apparatus according to claim1, further comprising: an audio capture device to continuously captureaudio data emitted at least by the first user and for transmission viathe data transmitter for at least a remote use in the virtualenvironment session; an audio driver for producing audio content of thesecond user received by the data receiver; and speakers for outputtingthe audio content.
 6. The apparatus according to claim 5, wherein theaudio capture device, and the speakers are part of a multi-channel audiosystem deployed in up to 360° around the participant.
 7. The apparatusaccording to claim 1, further comprising a command processor to receivecommands from the first user and for transmission via the datatransmitter for interfacing with the virtual environment session.
 8. Theapparatus according to claim 7, further comprising an interface forcommunicating with the command processor.
 9. The apparatus according toclaim 1, wherein the rendering engine outputs a reference orientationlandmark for calibration of the first user's morphology relative to thereference orientation in the virtual environment, and wherein the userspecific orientation of the first user is based on the referenceorientation landmark.
 10. The apparatus according to claim 9, whereinthe rendering engine outputs a mirror image of the processed 3Drepresentation of the first user as the reference orientation landmark,the mirror image being orientable to set the user specific orientationof the first user.
 11. The apparatus according to claim 1, wherein theat least one 3D capture device continuously captures a 3D point cloud ofthe first user.
 12. The apparatus according to claim 11, wherein the atleast one 3D capture device continuously captures the 3D point cloud ofthe first user with chromatic and luminance data.
 13. The apparatus ofclaim 1, wherein the image processor calibrates user specificorientation by performing morphological recognition of the first user'sanatomy.
 14. A system for operating a virtual environment session,comprising: at least two of the apparatus of claim 1; and a virtualenvironment server comprising: a virtual environment manager forproviding the image data for the virtual environment and for managing acoordinate system of a virtual environment based on at least on the userspecific position and orientation of the users in the virtualenvironment.
 15. The system according to claim 14, wherein the virtualenvironment server further comprises an asset manager for recording andstoring modifications to the virtual environment.
 16. The systemaccording to claim 14, wherein the virtual environment server furthercomprises a control unit for administrating virtual environmentsessions, the administrating comprising at least one of login,troubleshooting, technical support, calibrating, events synchronisation,stream monitoring, access privileges/priority management.
 17. A methodfor participating in a virtual environment session comprisingcontinuously receiving image data of a first user; calibrating the imagedata of the first user to obtain a user specific orientation as afunction of a reference orientation a virtual environment, thecalibrating including determining the user specific orientation from amorphology of the first user's anatomy; continuously producing a first3D representation of the first user based on the image data, the first3D representation having at least punctually a user specific positionand said user specific orientation in a virtual environment;transmitting the first 3D representation for at least a remote use inthe virtual environment session; continuously receiving a second 3Drepresentation of a second user, image data for the virtual environment,and at least punctually receiving user specific position and userspecific orientation of the second 3D representation relative to thevirtual environment, the user specific orientation of the second userbeing calibrated so as to be a function of the reference orientation ofthe virtual environment; and outputting for display at least the 3Drepresentation of the second user positioned relative to the first useras inserted in the virtual environment based on said user specificpositions and on said user specific orientation.
 18. The methodaccording to claim 17, further comprising outputting a referenceorientation landmark for calibration of the first user's morphologyrelative to the reference orientation in the virtual environment, theuser specific orientation of the first user based on the referenceorientation landmark.
 19. The method according to claim 18, whereinoutputting the reference orientation landmark comprises outputting amirror image of the 3D representation of the first user as the referenceorientation landmark, the mirror image being orientable to set the userspecific orientation of the first user.
 20. The method according toclaim 18, wherein calibrating the image data of the first user includesperforming morphological recognition of the first user's anatomy.